Project 1: Feature Detection and Matching

By Kevin Yang (ky238@cornell.edu)

Note: I know the results from this project are very poor; I believe that there is a problem with the MOPS descriptor. However, despite spending a significant amount of time trying to find the error, I was still unsuccessful. I would really appreciate it if one could point me to my mistake.

Feature Descriptor

For my feature descriptor, I implemented a descriptor that is based off of the Scale Invariant Feature Transform (SIFT) descriptor. This version is simpler than the actual implementation in that it is an implementation of the basic idea. For this descriptor, I created a histogram of the edge orientations based upon the 16x16 window around the detected feature and filled in the edge orientations into a histogram with 8 bins. However, unlike the SIFT descriptor, I did not split up the window into 4x4 grids of cells, electing to keep all the angular data of the window in one histogram.

Design Choices

I chose to implement SIFT because of its scale and rotational invariance, as well as being robust in a variety of situations. This has a clear benefit in that this descriptor can be used in many real world situations where taken images can vary greatly in illumination as well as scale and rotation. However, given my time constraints, I was unable to implement the SIFT operator in its full form, electing to implement a simplified version of it.

Performance

The performance of the program is not particularly good. The algorithms clearly performed better on the Yosemite dataset in comparison to the Graf dataset due to the fact that the image is a translation, which makes all the descriptors used (simple, MOPS and SIFT) more effective in identifying the correct match. The results are shown below.

yose.plot.roc.png

Graf is a more difficult dataset due to the fact that the two images were taken from different locations. This means that not only is it a translation, but the image is also rotated and is of a different scale. In this set, the simple descriptor really suffers as it is particularly robust to neither rotation nor scalar changes due to the fact that it only takes a non-oriented 5x5 pixel area around a feature for comparison. The rather poor performance of the MOPS descriptor is somewhat surprising; intuitively, it should be more effective than the simple descriptor because it, unlike the simple descriptor is oriented and does not rely on the pixel area directly around a feature (rather, it subsamples a 41x41 region around the Harris feature). The ROC curve for MOPS is rather unrealistic. The best performance, unsurprisingly, is seen from the SIFT descriptor. Being designed to be both scale invariant as well as rotation invariant, it is able to perform much better than the other two descriptors. The results are shown below.

graf.plot.roc.png

	graf	leuven	bikes	wall	Average
Simple Descriptor
SSD	0.610545	0.430206	0.466590	0.303750	0.4527728
Ratio Test	0.675835	0.647325	0.675835	0.637598	0.6591483
MOPS Descriptor
SSD	0.110549	0.134677	0.365504	0.083914	0.173661
Ratio Test	0.149928	0.180537	0.050388	0.099588	0.1201103
MySIFT Descriptor
SSD	0.621322	0.226715	0.345660	0.176709	0.342602
Ratio Test	0.403039	0.299812	0.380727	0.169910	0.313372

Strengths & Weaknesses

Some strengths of the baseline SIFT descriptor is its robustness even in difficult situations (i.e. different rotation and scale). This explains the popularity of the SIFT descriptor in the computer vision community. The MOPS descriptor displayed surprisingly poor performance; it seems likely that is because of an error in the implementation rather than in the algorithm itself. MySIFT surprisingly did not perform as well as the simple descriptor. Perhaps the reason for this is because